Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics Proceedings of the Main Conference
نویسندگان
چکیده
Email is the number one activity that people do on the internet: 74% of internet users check their email on an average day. Email use in offices has more than doubled since 2000, and is now over 8 hours a week. There are many great NLP problems for email, like automatic clustering and foldering, search, prioritization, automatically finding keywords within messages, finding addresses, and summarization. Spam is the number one problem for email. I?ll talk about how spam filters work, and the current open problems, as well as other kinds of abuse like chat spam (Spat), IM spam (Spim), blog comment spam (Blat), etc. all of which make great NLP problems. Email and abuse problems like spam can be some of the most exciting for research: they inspire us to work on new problems we would otherwise not have found. We are exploring areas like adversarial learning, learning with unbalanced costs, and learning with partial user feedback. Shipping solutions to these problems is both surprisingly hard and surprisingly fun. For NLP Researchers, the hardest constraint is that products ship in about 20 languages. By carefully choosing tools like word clustering that are easy to build in many languages, instead of similar tools like taggers that may not exist everywhere, we increase the chance of shipping. When we have actually built complete systems and given them to users, we have found several new and interesting problems in the most exciting way, by shipping solutions that don?t work the first time around. Bio Joshua Goodman is a Principal Researcher in the Machine Learning and Applied Statistic group at Microsoft Research, where he runs a team focused on Learning for Messaging and Adversarial Problems. Spam filters he helped develop stop over a billion spam messages per day. He has also worked on language modeling and machine learning, and has a Ph.D. in Computer Science from Harvard University for his work on Statistical Parsing. He helped start and is now President of the Conference on Email and Anti-Spam.
منابع مشابه
NAACL HLT 2009 Human Language Technologies : The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
متن کامل
Linguistic Characteristics of Nouns Representing the Concept of “Power” in American Political Discourse
The article analyzes the linguistic nature of some nuclear lexemes (power, control, authority, influence) verbalizing the basic concept of the English-language political discourse – the concept of “Power”. The paper reveals the structure of the concept, its conceptual features, as well as the main characteristics of the concept representatives at the language and functional levels.
متن کاملHuman Language Technology Conference of the North American Chapter of the Association of Computational Linguistics Proceedings of the Doctoral Consortium
Structural information in language is important for obtaining a better understanding of a human communication (e.g., sentence segmentation, speaker turns, and topic segmentation). Human communication involves a variety of multimodal behaviors that signal both propositional content and structure, e.g., gesture, gaze, and body posture. These non-verbal signals have tight temporal and semantic lin...
متن کاملA Linguistic Analysis of Conference Titles in Applied Linguistics
Over the past twenty-five years, researchers have expressed considerable interest in titles of academic publications. Unfortunately, conference paper titles (CPTs) have only recently begun to receive attention. The aim of this study, therefore, is to investigate the text length, syntactic structure, and lexicon of CPTs in Applied Linguistics. A data set of 698 titles was selected from the 2008 ...
متن کاملA Linguistic Analysis of Conference Titles in Applied Linguistics
Over the past twenty-five years, researchers have expressed considerable interest in titles of academic publications. Unfortunately, conference paper titles (CPTs) have only recently begun to receive attention. The aim of this study, therefore, is to investigate the text length, syntactic structure, and lexicon of CPTs in Applied Linguistics. A data set of 698 titles was selected from the 2008 ...
متن کاملCreating Local Coherence: An Empirical Assessment
Two of the mechanisms for creating natural transitions between adjacent sentences in a text, resulting in local coherence, involve discourse relations and switches of focus of attention between discourse entities. These two aspects of local coherence have been traditionally discussed and studied separately. But some empirical studies have given strong evidence for the necessity of understanding...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006